Knowledge-Based Multilingual Document Analysis
نویسندگان
چکیده
The growing availability of multilingual resources, like EuroWordnet, has recently inspired the development of large scale linguistic technologies, e.g. multilingual IE and Q&A, that were considered infeasible until a few years ago. In this paper a system for categorisation and automatic authoring of news streams in different languages is presented. In our system, a knowledge-based approach to Information Extraction is adopted as a support for hyperlinking. Authoring across documents in different languages is triggered by Named Entities and event recognition. The matching of events in texts is carried out by discourse processing driven by a large scale world model. This kind of multilingual analysis relies on a lexical knowledge base of nouns(i.e. the EuroWordnet Base Concepts) shared among English, Spanish and Italian lexicons. The impact of the design choices on the language independence and the possibilities it opens for automatic learning of the event hierarchy will be discussed.
منابع مشابه
Knowledge-Based Representation for Transductive Multilingual Document Classification
Multilingual document classification is often addressed by approaches that rely on language-specific resources (e.g., bilingual dictionaries and machine translation tools) to evaluate cross-lingual document similarities. However, the required transformations may alter the original document semantics, raising additional issues to the known difficulty of obtaining high-quality labeled datasets. T...
متن کاملSemantic-Based Multilingual Document Clustering via Tensor Modeling
A major challenge in document clustering research arises from the growing amount of text data written in different languages. Previous approaches depend on language-specific solutions (e.g., bilingual dictionaries, sequential machine translation) to evaluate document similarities, and the required transformations may alter the original document semantics. To cope with this issue we propose a ne...
متن کاملMultilingual Document Classification via Transductive Learning
We present a transductive learning based framework for multilingual document classification, originally proposed in [7]. A key aspect in our approach is the use of a large-scale multilingual knowledge base, BabelNet, to support the modeling of different language-written documents into a common conceptual space, without requiring any language translation process. Results on real-world multilingu...
متن کاملA Latent Semantic Indexing-based approach to multilingual document clustering
The creation and deployment of knowledge repositories formanaging, sharing, and reusing tacit knowledgewithin an organization has emerged as a prevalent approach in current knowledge management practices. A knowledge repository typically contains vast amounts of formal knowledge elements, which generally are available as documents. To facilitate users' navigation of documents within a knowledge...
متن کاملClustering multilingual documents by estimating text - to - text semantic relatedness
This thesis is about multilingual document clustering through estimating semantic relatedness between multilingual texts. Specifically we focus on the task of clustering multilingual documents with very limited or no supervisory information. We present two approaches to address the problem : a comparable-corpora based approach and a web-searches based approach. Our first approach derives pairwi...
متن کاملMultilingual Document Clustering Using Wikipedia as External Knowledge
This paper presents Multilingual Document Clustering (MDC) on comparable corpora. Wikipedia, a structured multilingual knowledge base, has been highly exploited in many monolingual clustering approaches and also in comparing multilingual corpora. But there is no prior work which studied the impact of Wikipedia on MDC. Here, we have made an in-depth study on availing Wikipedia in enhancing MDC p...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2002